Financial information extraction using pre-defined and user-definable templates in the Lolita system
نویسنده
چکیده
Financial operators have today access to an extremely large amount of data, both quantitative and qualitative, real-time or historical and can use this information to support their decision-making process. Quantitative data are largely processed by automatic computer programs, often based on arti cial intelligence techniques, that produce quantitative analysis, such as historical price analysis or technical analysis of price behaviour. Di erently, little progress has been made in the processing of qualitative data, which mainly consists of nancial news articles from nancial newspapers or on-line news providers. As a result the nancial market players are overloaded with qualitative information which is potentially extremely useful but, due to the lack of time, is often ignored. The goal of this work is to reduce the qualitative data-overload of the nancial operators. The research involves the identi cation of the information in the source nancial articles which is relevant for the nancial operators' investment decision making process and to implement the associated templates in the LOLITA system. The system should process a large number of source articles and extract speci c templates according to the relevant information located in the source articles. The project also involves the design and implementation in LOLITA of a userde nable template interface for allowing the users to easily design new templates using sentences in natural language. This allows user-de ned information extraction from source texts. This di ers from most of existing information extraction systems which require the developers to code the templates directly in the system. The results of the research have shown that the system performed well in the extraction of nancial templates from source articles which would allow the nancial operator to reduce his qualitative data-overload. The results have also shown that the user-de nable template interface is a viable approach to user-de ned information extraction. A trade-o has been identi ed between the ease of use of the user-de nable template interface and the loss of performance compared to handcoded templates. Acknowledgements I would like to thank my supervisor Richard G. Morgan for his advice and support throughout the three years over which this research has been conducted. I would also like to thank Russell J. Collingham for his help and support and Stephen Eckett for his invaluable advice on the nancial aspects of the thesis. I am also grateful for all past and present members of the Laboratory for Natural Language Engineering who have helped provide such a pleasant research and social environment. Thank you to all those people who commented on the various drafts of this thesis, particularly Luisa Mich and Luigi Colazzo. Financial assistance for this project was received from the Opera Universitaria Society of the University of Trento. I would also like to thank the department of Economics of the University of Trento for awarding me the University of Trento studentship for postgraduate studies abroad. I would also like to thank you all my friends who supported me during the these three years and, particularly, Edy, Mika, Marianna, Paolo, Wendy, Janice, Giacomo, Max, Xavier and Sergio. Finally I would like to thank my Mum for her support during my time at the University. Declaration The material contained within this thesis has not previously been submitted for a degree at the University of Durham or any other university. The research reported within this thesis has been conducted by the author unless indicated otherwise. The copyright of this thesis rests with the author. No quotation from it should be published without his prior written consent and information derived from it should be acknowledged.
منابع مشابه
The LOLITA User-Definable Template Interface
The development of user-definable templates interfaces which allow the user to design new templates definitions in a user-friendly way is a new issue in the field of information extraction. The LOLITA user-definable templates interface allows the user to define new templates using sentences in natural language text with a few restrictions and formal elements. This approach is rather different f...
متن کاملTemplate Mining for Information Extraction from Digital Documents
WITHT H E RAPID GROWTH OF DIGITAL INFORMATION RESOURCES, information extraction (1E)-the process of automatically extracting information from natural language texts-is becoming more important. A number of IE systems, particularly in the areas of news/fact retrieval and in domain-specific areas, such as in chemical and patent information retrieval, have been developed in the recent past using th...
متن کاملDesigning Decision Maker in a Smart Home for Energy Consumption Optimization Using Fuzzy Modeling
existed electricity grids deliver produced power to the consumer passing through transmission and distribution grids. According to high losses of these grids in transmission level and inexistence of bilateral interaction for simultaneous information exchange, a concept of smart grids were made by capabilities such as consciously participation of consumers in the smart electricity grids, an amou...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملMeasuring the effectiveness of human resource information systems in national iranian oil company an empirical assessment
While the growth of MIS investment and its influence is making MIS evaluation ever more indispensable, little attention has been paid to assessing and communicating system effectiveness. This paper attempts to empirically assess the effectiveness of integrated human resource information system in Iranian oil industry. As suggested by recent research, the widely accepted IS success model is...
متن کامل